Second-Order Step-Size Tuning of SGD for Non-Convex Optimization

نویسندگان

چکیده

In view of a direct and simple improvement vanilla SGD, this paper presents fine-tuning its step-sizes in the mini-batch case. For doing so, one estimates curvature, based on local quadratic model using only noisy gradient approximations. One obtains new stochastic first-order method (Step-Tuned SGD), enhanced by second-order information, which can be seen as version classical Barzilai-Borwein method. Our theoretical results ensure almost sure convergence to critical set we provide rates. Experiments deep residual network training illustrate favorable properties our approach. such networks observe, during training, both sudden drop loss an test accuracy at medium stages, yielding better than RMSprop, or ADAM.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Natasha 2: Faster Non-Convex Optimization Than SGD

We design a stochastic algorithm to train any smooth neural network to ε-approximate local minima, using O(ε−3.25) backpropagations. The best result was essentially O(ε−4) by SGD. More broadly, it finds ε-approximate local minima of any smooth nonconvex function in rate O(ε−3.25), with only oracle access to stochastic gradients. ∗V1 appeared on arXiv on this date. V2 and V3 polished writing. Th...

متن کامل

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Second-order methods, which utilize gradients as well as Hessians to optimize a given function, are of major importance in mathematical optimization. In this work, we study the oracle complexity of such methods, or equivalently, the number of iterations required to optimize a function to a given accuracy. Focusing on smooth and convex functions, we derive (to the best of our knowledge) the firs...

متن کامل

Second order sensitivity analysis for shape optimization of continuum structures

This study focuses on the optimization of the plane structure. Sequential quadratic programming (SQP) will be utilized, which is one of the most efficient methods for solving nonlinearly constrained optimization problems. A new formulation for the second order sensitivity analysis of the two-dimensional finite element will be developed. All the second order required derivatives will be calculat...

متن کامل

Sparse Second Order Cone Programming Formulations for Convex Optimization Problems

Second order cone program (SOCP) formulations of convex optimization problems are studied. We show that various SOCP formulations can be obtained depending on how auxiliary variables are introduced. An efficient SOCP formulation that increases the computational efficiency is presented by investigating the relationship between the sparsity of an SOCP formulation and the sparsity of the Schur com...

متن کامل

A second-order pruning step for verified global optimization

We consider pruning steps used in a branch-and-bound algorithm for veri ed global optimization. A rst-order pruning step was given by Ratz using automatic computation of a rst-order slope tuple [21, 22]. In this paper, we introduce a second-order pruning step which is based on automatic computation of a second-order slope tuple. We add this second-order pruning step to the algorithm of Ratz. Fu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neural Processing Letters

سال: 2022

ISSN: ['1573-773X', '1370-4621']

DOI: https://doi.org/10.1007/s11063-021-10705-5